Method and system for synchronizing a virtual file system at a computing device with a storage device

ABSTRACT

A method of resolving conflicts between revisions to a distributed virtual file system is implemented at a computing device that is communicatively connected to a plurality of storage devices. The virtual file system at the computing device has a first revision of the virtual file system. Upon receipt of a request to synchronize the first revision of the virtual file system with the storage devices, the computing device retrieves one or more blocks from the storage devices, which are associated with a second revision of the virtual file system. The computing device then merges a first component of the first revision with a corresponding component of the second revision if a first predefined condition is met or identifies a second component of the first revision as being conflicted with a corresponding component of the second revision if a second predefined set of conditions is met.

RELATED APPLICATIONS

This application is a continuation application of U.S. patentapplication Ser. No. 12/954,414, filed Nov. 24, 2010, which is acontinuation Application of U.S. patent application Ser. No. 12/994,444filed on Nov. 23, 2010, which is a National Stage Application filedunder 35 U.S. §371 of PCT Patent Application Serial No.PCT/CN2010/076437 filed on Aug. 27, 2010, which claims the benefit ofand priority to U.S. Provisional Patent Application No. 61/237,902,“Distributed fault-tolerant content addressable storage based filesystem with revision control utilizing heterogeneous data storagedevices”, filed on Aug. 28, 2009, all of which are hereby incorporatedby reference in their entirety.

TECHNICAL FIELD

The disclosed embodiments relate generally to cloud storage, and morespecifically to method and system for managing a distributed storagesystem through a virtual file system.

BACKGROUND

Cloud computing is deemed as a paradigm shift from the mainframe orclient-server based information service. Because the details of thetechnology infrastructure that supports cloud computing are hidden “inthe cloud,” a user who uses the services based on cloud computing is notrequired to have the expertise in, or control over, the technologyinfrastructure.

Among the cloud computing based services, cloud storage is one thatreceives more and more attention with the dramatic expansion of dataaccumulation speed even at an individual person's level. For example,most of today's smart phones are equipped with digital camera or evenvideo camera for generating high-resolution multimedia content. Thus,the large volume of data generated by a user of a smart phone can easilyfill up its local storage space (e.g., a flash memory card) within ashort period of time as well as other local storage devices (e.g., acomputer hard drive) that the user has access to. To avoid potentialdata loss due to a fatal device failure, the user may have to installspecial software application on its computer to manage the large volumeof data, e.g., moving the data from one device to another device orreplicating data to ensure reliability. This process is often tediousand time-consuming.

A cloud storage based solution addresses this data explosion problem byoffering an Internet-based storage service within a web interfacethrough which different subscribers can upload their data into remotestorage devices managed by a third-party that has the technology andresources for maintaining the integrity of the uploaded data. Butbecause different third-parties often use different technologies, itremains a challenge for an individual user to integrate the cloudstorage and the local storage in a streamlined fashion.

SUMMARY

The above deficiencies and other problems associated with integratingthe cloud storage and the local storage in a streamlined fashion areaddressed by the disclosed embodiments.

In accordance with some embodiments, a computer-implemented method forforming a virtual file system associated with a distributed storagesystem is implemented at a computing device having one or moreprocessors and memory. The memory stores one or more programs forexecution by the one or more processors on the computing device, whichis associated with a distributed storage system that includes aplurality of storage devices.

The computing device receives a request for forming a virtual filesystem, which is associated with a plurality of storage devices. Thecomputing device retrieves one or more metadata blocks from theplurality of storage devices, each metadata block including metadataassociated with a respective component of the virtual file system. Thecomputing device renders a commit tree for the virtual file system byprocessing the retrieved metadata blocks in a predefined order. In someembodiments, the commit tree includes a plurality of directory nodes andfile nodes, each directory node or file node having metadatacorresponding to a respective directory or file of the virtual filesystem. The computing device builds an instance of the virtual filesystem by traversing the plurality of directory nodes and file nodes ina recursive manner. In some embodiments, for a respective directorynode, the computing device creates a directory in accordance with themetadata associated with the directory node; for a respective file node,the computing device retrieves one or more data blocks from theplurality of storage devices in accordance with the metadata associatedwith the file node and creates a file using the retrieved data blocks.

In accordance with some embodiments, a computing device in associationwith a distributed storage system that includes a plurality of storagedevices includes one or more processors, memory, and one or moreprograms stored in the memory for execution by the one or moreprocessors. The one or more programs include instructions for: receivinga request for forming a virtual file system, wherein the virtual filesystem is associated with a plurality of storage devices; retrieving oneor more metadata blocks from the plurality of storage devices, whereineach metadata block includes metadata associated with a respectivecomponent of the virtual file system; rendering a commit tree for thevirtual file system by processing the retrieved metadata blocks in apredefined order, wherein the commit tree includes a plurality ofdirectory nodes and file nodes, each directory node or file node havingmetadata corresponding to a respective directory or file of the virtualfile system; and building an instance of the virtual file system bytraversing the plurality of directory nodes and file nodes in arecursive manner, further including: creating a directory in accordancewith the metadata associated with a respective directory node, andretrieving one or more data blocks from the plurality of storage devicesin accordance with the metadata associated with a respective file nodeand creating a file using the retrieved data blocks.

In accordance with some embodiments, a computer readable storage mediumstores one or more programs configured for execution by a computingdevice having one or more processors and memory storing one or moreprograms for execution by the one or more processors in association witha distributed storage system that includes a plurality of storagedevices. The one or more programs comprise instructions to: receive arequest for forming a virtual file system, wherein the virtual filesystem is associated with a plurality of storage devices; retrieve oneor more metadata blocks from the plurality of storage devices, whereineach metadata block includes metadata associated with a respectivecomponent of the virtual file system; create a commit tree for thevirtual file system by processing the retrieved metadata blocks in apredefined order, wherein the commit tree includes a plurality ofdirectory nodes and file nodes, each directory node or file node havingmetadata corresponding to a respective directory or file of the virtualfile system; and build an instance of the virtual file system bytraversing the plurality of directory nodes and file nodes in arecursive manner, further including: creating a directory in accordancewith the metadata associated with a respective directory node, andretrieving one or more data blocks from the plurality of storage devicesin accordance with the metadata associated with a respective file nodeand creating a file using the retrieved data blocks.

In accordance with some embodiments, a computer-implemented method forfetching data associated with a file from a distributed storage systemis implemented at a computing device having one or more processors andmemory. The memory stores one or more programs for execution by the oneor more processors on the computing device, which is associated with adistributed storage system that includes a plurality of storage devices.

The computing device receives from an application a file request throughan instance of a virtual file system. In some embodiments, the virtualfile system is associated with a plurality of storage devices andincludes metadata associated with the requested file. The computingdevice checks the metadata to determine that a first set of data blocksof the requested file is present at the computing device and a secondset of data blocks of the requested file is not present at the computingdevice. The computing device retrieves the second set of data blocksfrom the plurality of storage devices. The computing device rebuilds aninstance of the requested file using the first set of data blocks andthe retrieved second set of data blocks. The computing device updatesthe metadata of the requested file to reflect the presence of theretrieved second set of data blocks. The computing device serves therebuilt instance of the requested file to the requesting application.

In accordance with some embodiments, a computing device in associationwith a distributed storage system that includes a plurality of storagedevices includes one or more processors, memory, and one or moreprograms stored in the memory for execution by the one or moreprocessors. The one or more programs include instructions for: receivingfrom an application a file request through an instance of a virtual filesystem, wherein the virtual file system is associated with a pluralityof storage devices and includes metadata associated with the requestedfile; checking the metadata to determine that a first set of data blocksof the requested file is present at the computing device and a secondset of data blocks of the requested file is not present at the computingdevice; retrieving the second set of data blocks from the plurality ofstorage devices; rebuilding an instance of the requested file using thefirst set of data blocks and the retrieved second set of data blocks;updating the metadata of the requested file to reflect the presence ofthe retrieved second set of data blocks; and serving the rebuiltinstance of the requested file to the requesting application.

In accordance with some embodiments, a computer readable storage mediumstores one or more programs configured for execution by a computingdevice having one or more processors and memory storing one or moreprograms for execution by the one or more processors in association witha distributed storage system that includes a plurality of storagedevices. The one or more programs comprise instructions to: receive froman application a file request through an instance of a virtual filesystem, wherein the virtual file system is associated with a pluralityof storage devices and includes metadata associated with the requestedfile; check the metadata to determine that a first set of data blocks ofthe requested file is present at the computing device and a second setof data blocks of the requested file is not present at the computingdevice; retrieve the second set of data blocks from the plurality ofstorage devices; rebuild an instance of the requested file using thefirst set of data blocks and the retrieved second set of data blocks;update the metadata of the requested file to reflect the presence of theretrieved second set of data blocks; and serve the rebuilt instance ofthe requested file to the requesting application.

In accordance with some embodiments, a computer-implemented method forperforming automatic differential data compression in association withstoring a file at a distributed storage system is implemented at acomputing device having one or more processors and memory. The memorystores one or more programs for execution by the one or more processorson the computing device, which is associated with a distributed storagesystem that includes a plurality of storage devices.

The computing device receives a request to create a revision of avirtual file system in a storage device. In some embodiments, thevirtual file system has a commit tree that includes a plurality ofdirectory nodes and file nodes, each directory node or file node havingmetadata corresponding to a respective directory or file of the virtualfile system. For each of the plurality of directory nodes and filenodes, the computing device generates an object by serializing the treenode's associated metadata in a predefined order and creates an objectID from the serialized metadata. The computing device stores the objectat the storage device if the object ID is not present in anobject-storage mapping table associated with the virtual file system andinserts the object ID into the object-storage mapping table. Thecomputing device stores the object-storage mapping table at the storagedevice. For each content block associated with a respective file of thevirtual file system, the computing device creates a block ID from thecontent block. The computing device stores the content block at thestorage device if the block ID is not present in a content block-storagemapping table associated with the virtual file system and inserts theblock ID into the content block-storage mapping table. The computingdevice stores the content block-storage mapping table at the storagedevice.

In accordance with some embodiments, a computing device in associationwith a distributed storage system that includes a plurality of storagedevices includes one or more processors, memory, and one or moreprograms stored in the memory for execution by the one or moreprocessors. The one or more programs include instructions for: receivinga request to create a revision of a virtual file system in a storagedevice, wherein the virtual file system has a commit tree that includesa plurality of directory nodes and file nodes, each directory node orfile node having metadata corresponding to a respective directory orfile of the virtual file system; for each of the plurality of directorynodes and file nodes, generating an object by serializing the treenode's associated metadata in a predefined order and creating an objectID from the serialized metadata; and storing the object at the storagedevice if the object ID is not present in an object-storage mappingtable associated with the virtual file system and inserting the objectID into the object-storage mapping table; storing the object-storagemapping table at the storage device; for each content block associatedwith a respective file of the virtual file system, creating a block IDfrom the content block; storing the content block at the storage deviceif the block ID is not present in a content block-storage mapping tableassociated with the virtual file system and inserting the block ID intothe content block-storage mapping table; and storing the contentblock-storage mapping table at the storage device.

In accordance with some embodiments, a computer readable storage mediumstores one or more programs configured for execution by a computingdevice having one or more processors and memory storing one or moreprograms for execution by the one or more processors in association witha distributed storage system that includes a plurality of storagedevices. The one or more programs comprise instructions to: receive arequest to create a revision of a virtual file system in a storagedevice, wherein the virtual file system has a commit tree that includesa plurality of directory nodes and file nodes, each directory node orfile node having metadata corresponding to a respective directory orfile of the virtual file system; for each of the plurality of directorynodes and file nodes, generate an object by serializing the tree node'sassociated metadata in a predefined order and create an object ID fromthe serialized metadata; and store the object at the storage device ifthe object ID is not present in an object-storage mapping tableassociated with the virtual file system and insert the object ID intothe object-storage mapping table; store the object-storage mapping tableat the storage device; for each content block associated with arespective file of the virtual file system, create a block ID from thecontent block; store the content block at the storage device if theblock ID is not present in a content block-storage mapping tableassociated with the virtual file system and insert the block ID into thecontent block-storage mapping table; and store the content block-storagemapping table at the storage device.

In accordance with some embodiments, a computer-implemented method forcomputing parity data associated with a distributed storage system isimplemented at a computing device having one or more processors andmemory. The memory stores one or more programs for execution by the oneor more processors on the computing device, which is associated with adistributed storage system that includes a plurality of storage devices.

The computing device receives a request for a file, which is associatedwith a set of data segments. In response, the computing device attemptsto retrieve the set of data segments from one or more storage devicescommunicatively connected to the computing device. For each missing datasegment that the computing device fails to retrieve, the computingdevice identifies a data recovery scheme. The data recovery schemeinvolves at least one base data segment that is present at the computingdevice and at least one parity data segment that is located at one ofthe storage devices remote from the computing device. After retrievingthe at least one parity data segment from the storage devices, thecomputing device computes the missing data segment by applying the datarecovery scheme to the at least one base data segment and the at leastone parity data segment retrieved from the storage devices and buildsthe requested file using the computed missing data segments.

In accordance with some embodiments, a computing device in associationwith a distributed storage system that includes a plurality of storagedevices includes one or more processors, memory, and one or moreprograms stored in the memory for execution by the one or moreprocessors. The one or more programs include instructions for: receivinga request for a file, where in the file is associated with a set of datasegments; retrieving the set of data segments from one or more storagedevices communicatively connected to the computing device; for eachmissing data segment that the computing device fails to retrieve,identifying a data recovery scheme, wherein the data recovery schemeinvolves at least one base data segment that is present at the computingdevice and at least one parity data segment that is located at one ofthe storage devices remote from the computing device; retrieving the atleast one parity data segment from the storage devices; and computingthe missing data segment by applying the data recovery scheme to the atleast one base data segment and the at least one parity data segmentretrieved from the storage devices; and building the requested fileusing the computed missing data segments.

In accordance with some embodiments, a computer readable storage mediumstores one or more programs configured for execution by a computingdevice having one or more processors and memory storing one or moreprograms for execution by the one or more processors in association witha distributed storage system that includes a plurality of storagedevices. The one or more programs comprise instructions to: receive arequest for a file, where in the file is associated with a set of datasegments; retrieve the set of data segments from one or more storagedevices communicatively connected to the computing device; for eachmissing data segment that the computing device fails to retrieve,identifying a data recovery scheme, wherein the data recovery schemeinvolves at least one base data segment that is present at the computingdevice and at least one parity data segment that is located at one ofthe storage devices remote from the computing device; retrieving the atleast one parity data segment from the storage devices; and computingthe missing data segment by applying the data recovery scheme to the atleast one base data segment and the at least one parity data segmentretrieved from the storage devices; and building the requested fileusing the computed missing data segments.

In accordance with some embodiments, a computer-implemented method formerging two or more revisions to a virtual file system associated with adistributed storage system is implemented at a computing device havingone or more processors and memory. The memory stores one or moreprograms for execution by the one or more processors on the computingdevice, which is associated with a distributed storage system thatincludes a plurality of storage devices. The virtual file system has anassociated commit tree corresponding to a first revision of the virtualfile system.

The computing device retrieves one or more metadata blocks from aplurality of storage devices. The metadata blocks are associated with asecond revision of the virtual file system. The computing device updatesthe commit tree by processing the retrieved metadata blocks in apredefined order. In particular, the computing device replaces a firstcomponent of the virtual file system corresponding to the first revisionof the commit tree with an associated component of the virtual filesystem corresponding to the second revision of the commit tree if afirst predefined set of conditions is met; and the computing deviceidentifies a second component of the virtual file system correspondingto the first revision of the commit tree as being associated with acomponent of the virtual file system corresponding to the secondrevision of the commit tree if a second predefined set of conditions ismet.

In accordance with some embodiments, a computing device on which avirtual file system operates in association with a distributed storagesystem that includes a plurality of storage devices includes one or moreprocessors, memory, and one or more programs stored in the memory forexecution by the one or more processors. The virtual file system has anassociated commit tree corresponding to a first revision of the virtualfile system. The one or more programs include instructions for:retrieving one or more metadata blocks from a plurality of storagedevices, wherein the metadata blocks are associated with a secondrevision of the virtual file system; and updating the commit tree byprocessing the retrieved metadata blocks in a predefined order, furtherincluding: replacing a first component of the virtual file systemcorresponding to the first revision of the commit tree with anassociated component of the virtual file system corresponding to thesecond revision of the commit tree if a first predefined set ofconditions is met; and identifying a second component of the virtualfile system corresponding to the first revision of the commit tree asbeing associated with a component of the virtual file systemcorresponding to the second revision of the commit tree if a secondpredefined set of conditions is met.

In accordance with some embodiments, a computer readable storage mediumstores one or more programs configured for execution by a computingdevice having one or more processors and memory storing one or moreprograms for execution by the one or more processors in association witha virtual file system and a distributed storage system that includes aplurality of storage devices. The one or more programs compriseinstructions to: retrieve one or more metadata blocks from a pluralityof storage devices, wherein the metadata blocks are associated with asecond revision of the virtual file system; and update the commit treeby processing the retrieved metadata blocks in a predefined order,further including: replacing a first component of the virtual filesystem corresponding to the first revision of the commit tree with anassociated component of the virtual file system corresponding to thesecond revision of the commit tree if a first predefined set ofconditions is met; and identifying a second component of the virtualfile system corresponding to the first revision of the commit tree asbeing associated with a component of the virtual file systemcorresponding to the second revision of the commit tree if a secondpredefined set of conditions is met.

Thus, methods and systems are provided that make it more convenient andmore efficient for a computing device to synchronize local storage witha distributed storage system including a plurality of storage devices(e.g., a storage cloud) through a virtual file system. From a user'sperspective, the virtual file system acts like a hard drive, except thedata is stored elsewhere. The virtual file system provides theflexibility for a user to access its data from multiple computingdevices while giving the user a familiar view of the user's files. Byunifying a user's storage space, the virtual file system makes availableto the user the unused space while implementing multiple protectionmechanisms for the user's data. In addition, the virtual file systemsimplifies data migration between storage devices hosted by differentservice providers by allowing a user to add new service providers and/ordrop existing service providers in response to service price changes,location changes, network availability changes, etc. For example, withthe availability of large hard drives, the virtual file system allows auser to create a personal storage cloud by combining the local harddrives with the on-line storage services together so that the user caneasily manage its data archives. By doing so, the virtual file systemprovides a user the same experience, online or offline. The virtual filesystem's built-in block management scheme uses both delta compressionand block-level content addressable storage (CAS) to minimizeduplication across files. The virtual file system automatically createsversions of every file it manages so that a user can go back to anyversion of a file in the file history or get an entire set of files at aparticular moment. In sum, more efficient synchronization results in auser having access to the updated data at the computing device morequickly; and more efficient usage of the network bandwidth also reducesthe risk of network traffic jam, leaving more bandwidth available forother tasks.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the aforementioned embodiments of theinvention as well as additional embodiments thereof, reference should bemade to the Description of Embodiments below, in conjunction with thefollowing drawings in which like reference numerals refer tocorresponding parts throughout the figures.

FIG. 1A is a block diagram illustrating a plurality of computingdevices, each including a virtual file system, that are communicativelyconnected to a distributed storage system according to some embodiments.

FIG. 1B is a block diagram illustrating a plurality of computing devicesthat are communicatively connected to a distributed storage system thatincludes a virtual file system according to some embodiments.

FIG. 1C is a block diagram illustrating a plurality of computingdevices, each having an embedded virtual file system device, that arecommunicatively connected to a distributed storage system according tosome embodiments.

FIG. 1D is a block diagram illustrating a plurality of components of acore engine according to some embodiments.

FIGS. 2A to 2F are block diagrams illustrating data structures inassociation with a virtual file system according to some embodiments.

FIG. 3A is a flow chart illustrating a process of initializing a virtualfile system at a computing device according to some embodiments.

FIG. 3B is a flow chart illustrating a process of a virtual file systemreturning a file in response to a file request from an application at acomputing device according to some embodiments.

FIG. 3C is a flow chart illustrating a process of a virtual file systemretrieving a set of data blocks associated with a file from a storagedevice according to some embodiments.

FIG. 3D is a flow chart illustrating a process of a virtual file systemgenerating metadata for a new revision of a file and synchronizing thenew revision with a storage device according to some embodiments.

FIG. 3E is a flow chart illustrating a process of a virtual file systemgenerating metadata for a deletion of a file and synchronizing thedeletion with a storage device according to some embodiments.

FIG. 4A is a flow chart illustrating a process of a virtual file systemprocessing metadata blocks and content blocks retrieved from a storagedevice according to some embodiments.

FIG. 4B is a flow chart illustrating a process of a virtual file systemcomputing missing metadata or content using parity data retrieved from astorage device according to some embodiments.

FIG. 5A is a flow chart illustrating a process of a virtual file systemsynchronizing metadata and data with a storage device according to someembodiments.

FIG. 5B is a flow chart illustrating a process of a virtual file systemserializing metadata and data to be synchronized with a storage deviceaccording to some embodiments.

FIG. 5C is a block diagram illustrating an intra-file parity computationscheme according to some embodiments.

FIG. 5D is a block diagram illustrating an inter-file parity computationscheme according to some embodiments.

FIGS. 6A to 6E are block diagrams illustrating multiple stages of anexemplary commit tree according to some embodiments.

FIG. 7 is a block diagram illustrating a client or server deviceequipped with a virtual file system according to some embodiments.

FIGS. 8A to 8F are exemplary screenshots of a virtual file systemaccording to some embodiments.

DESCRIPTION OF EMBODIMENTS

Reference will now be made in detail to embodiments, examples of whichare illustrated in the accompanying drawings. In the following detaileddescription, numerous specific details are set forth in order to providea thorough understanding of the present invention. However, it will beapparent to one of ordinary skill in the art that the present inventionmay be practiced without these specific details. In other instances,well-known methods, procedures, components, circuits, and networks havenot been described in detail so as not to unnecessarily obscure aspectsof the embodiments.

It will also be understood that, although the terms first, second, etc.may be used herein to describe various elements, these elements shouldnot be limited by these terms. These terms are only used to distinguishone element from another. For example, a first set of data blocks couldbe termed as a second set of data blocks, and, similarly, a second setof data blocks could be termed as a first set of data blocks, withoutdeparting from the scope of the present invention.

The terminology used in the description of the invention herein is forthe purpose of describing particular embodiments only and is notintended to be limiting of the invention. As used in the description ofthe invention and the appended claims, the singular forms “a”, “an” and“the” are intended to include the plural forms as well, unless thecontext clearly indicates otherwise. It will also be understood that theterm “and/or” as used herein refers to and encompasses any and allpossible combinations of one or more of the associated listed items. Itwill be further understood that the terms “comprises” and/or“comprising,” when used in this specification, specify the presence ofstated features, steps, operations, elements, and/or components, but donot preclude the presence or addition of one or more other features,steps, operations, elements, components, and/or groups thereof.

As used herein, the term “if’ may be construed to mean “when” or “upon”or “in response to determining” or “in response to detecting,” dependingon the context. Similarly, the phrase “if it is determined” or “if [astated condition] is detected” may be construed to mean “upondetermining” or “in response to determining” or “upon detecting [thestated condition or event]” or “in response to detecting [the statedcondition or event],” depending on the context.

FIG. 1A is a block diagram illustrating a plurality of computingdevices, each including a virtual file system, that are communicativelyconnected to a distributed storage system according to some embodiments.In some embodiments, a computing device is a laptop/desktop/tabletcomputer or a portable communication device such as a mobile telephone.For simplification, a computing device may be referred to as “a client”or “a client computer” throughout the specification.

As shown in FIG. 1A, clients 101-A and 101-B are communicativelyconnected to a storage cloud 120, each client including a virtual filesystem 102. In some embodiments, the virtual file system includes anapplication user interface (API) 103, a local file system 105, and acore engine 107. The API 103 is a software application for accessing thestorage cloud 120 through the core engine 107. In some embodiments, theAPI 103 is an application dedicated for this purpose. In someembodiments, the API 103 is an application (e.g., a web browserapplication) that can perform multiple functions including accessing thestorage cloud 120. In some embodiments, the local file system 105 is thefile system associated with the operating system (e.g., UNIX, Windows,or Linux, etc.) running on the client 101. For example, a user can usethe local file system 105 to access files not managed by the virtualfile system 102. The core engine 107 refers to a set of applicationmodules that are responsible for managing different aspects of thevirtual file system 102, such as retrieving files from a remote storagedevice, synchronizing one copy of a file stored at the client 101 withanother copy of the same file stored at a remote storage device, etc. Amore detailed description of the core engine 107 is provided below inconnection with FIG. 1D.

In some embodiments, the storage cloud 120 is a distributed,heterogeneous storage system including multiple types of storage devicessuch as local storage devices 109 (e.g., thumb drive, hard drive,network attached storage (NAS), etc.) and remote (and often distributed)cloud storage devices. In other words, the term “cloud” in thisapplication has a broader scope that may cover storage devices that arephysically local to or remote from the virtual file system. In someembodiments, the remote cloud storage devices is a cloud storage serviceprovided by a third-party (e.g., Amazon S3). In some embodiments, thecloud storage service includes a remote cloud platform 123, a set ofcloud service modules 125, and a set of cloud storage devices 127. Theremote cloud platform is typically a front end accessible through a webserver. The cloud service modules 125 are responsible for performingoperations (e.g., queuing, logging, billing, etc.) in support of thestorage service. The cloud storage devices are associated with ahardware architecture (e.g., storage area network) that supports massivedata storage/access through network connections such as Internet.

In some embodiments, a user of the virtual file system 102, which may bea person or a software application, submits a request for a file to theAPI 103. In response to the request, the API 103 checks if the requestedfile is available at the local file system 105. If so, it returns therequested file to the requesting user. If not, the API 103 may forwardthe file request to the core engine 107. As will be explained in detailbelow, the core engine 107 determines whether or not and how to retrieveinformation associated with the file (e.g., metadata and data) from astorage device within the storage cloud 120. After receiving theinformation, the core engine 107 then rebuilds the requested file in thelocal file system 105 and makes it available for the user to access.Upon detection of the user's updates to the file, the core engine 107then generates a new revision of the file and synchronizes the revisedfile including its metadata and data with one or more storage devicesassociated with the storage cloud 120 to make sure that all the userupdates are appropriately saved and protected against potential fileloss and/or unauthorized access. Note that the terms “revision” and“version” are used interchangeably throughout the specification.

FIG. 1B is a block diagram illustrating a plurality of computing devicesthat are communicatively connected to a distributed storage system thatincludes a virtual file system according to some embodiments. Note thatthe system infrastructure shown in FIG. 1B is similar to the one shownin FIG. 1A in many aspects except that the virtual file system is movedfrom the client side into the storage cloud 140.

In particular, there is a virtual file system server 143 in the storagecloud 140 for processing files requests from different clients 131. Thevirtual file system server 143 further includes an API 144 and a coreengine 146 for providing the cloud storage service as described above inconnection with FIG. 1A using the cloud service modules 145 and thecloud storage devices 147. In some embodiments, a client 131 includes aclient application 133 (e.g., a web browser) for receiving file requestsand serving the requested files, a client assistant 134 (e.g., a webbrowser plug-in application) for processing the file requests includingstoring the requested files at a location within the local file system135. In some embodiments, the client assistant 134 receives instructionsfrom the virtual file system server 143 for storing informationassociated with a file in the local storage devices 137 that are part ofthe storage cloud 140 but are located physically close to the client131-A.

In some embodiments, the system architecture shown in FIG. 1B isreferred to as a thin-client or zero-footprint deployment of the virtualfile system because most of the transactions such as those performed bythe core engine 146 do not happen on the client side but within thestorage cloud 140. As such, the number of dedicated softwareapplications on the client side can be minimized. In some embodiments,this system architecture is better suited for those computing deviceswith limited capacity (e.g., a mobile phone or a personal digitalassistant).

FIG. 1C is a block diagram illustrating a plurality of computingdevices, each having an embedded virtual file system device, that arecommunicatively connected to a distributed storage system according tosome embodiments. Compared with the other embodiments described above,FIG. 1C depicts a virtual file system device 157 that is communicativelyconnected to a client 151 and to a storage cloud 160. In someembodiments, the virtual file system device 157 is an embedded system ora standalone device that includes processors, memory, and softwareapplications like the API 153, the local file system 155, and the coreengine 158. The functionalities of these applications within the virtualfile system device 157 are similar to their counterparts describedabove. A user can attach the virtual file system device 157 to his orher personal computer. The virtual file system device 157 is able toreceive a file request from a client 151 (e.g., a PC) and process themby retrieving the requested file from the storage cloud 160 or its localstorage device 159 and returning the file to the client.

It is noted that the system architecture described above in connectionwith FIGS. 1A-1C are for illustrative purpose and one skilled in the artwould be able to develop other system architecture based on theteachings herein with no difficulty. It is further noted that thedifferent types of system architecture described above are by no meansmutually exclusively. It is possible for a hybrid virtual file system tocombine the different types of architecture by leveraging theirrespective advantages. For example, a client may have a virtual filesystem shown in FIG. 1 A while allowing a user to submit file requeststhrough a web server shown in FIG. 1B.

As describe above, the core engine of the virtual file system includesmultiple components for managing different aspects of the virtual filesystem 102. FIG. 1D is a block diagram illustrating a plurality ofcomponents of a core engine according to some embodiments.

As shown in FIG. 1D, the core engine 170 communicatively connects theclient 194 to the storage devices 196. In order to explain how the coreengine 170 supports different file-related requests, the core engine 170is divided into multiple application modules, each module beingresponsible for performing one or more specific operations. In someembodiments, the core engine 170 includes a virtual file system (VFS)management module 184 for interfacing with the client 194, e.g.,receiving client requests for downloading or uploading certain files orinstructions for modifying the virtual file system's configuration(e.g., its replication policy) and sending files to the requesting user.The VFS management module 184 is communicatively connected to the VFSconfiguration module 172 and the VFS retrieval module 186.

The VFS configuration module 172 is responsible for configuring thevirtual file system (in particular, the core engine 170) to performoperations in a user-specified manner. For example, a user may specifythat certain files or types of files not be stored in any remote storagedevices for security reason or that certain files or types of files bealways encrypted before being pushed to a remote storage device. In someembodiments, a user may define a tier structure for the storage cloud.For example, tier one are those local storage devices, tier two refersto those remote storage devices offered a service provider withhigh-speed access, and tier three correspond to those remote storagedevices without high-speed access. In addition, the user may furtherspecify that files having a predefined type or size or both should bestored at a particular tier of storage devices. For example, the usermay specify that video files should be stored in a local storage deviceas much as possible to avoid the delay when a user tries to access thesevideo files. In some embodiments, a user may provide instructions on howto achieve data redundancy (e.g., through parity computation) to recoverfrom potential system failure. In some embodiments, the VFSconfiguration module 172 receives the user preferences or requirementsthrough the VFS management module 184 and saves them at a predefinedlocation in the local file system (e.g., on a per-user basis) forcontrolling the behavior of the virtual file system.

The VFS retrieval module 186 is responsible for retrieving informationassociated with a file in accordance with instructions from the VFSmanagement module 184. As will be explained below in connection withFIGS. 2A to 2F, a file is first converted into a set of blocks beforebeing stored in any local or remote storage device according to someembodiments. Each block is a self-containing or self-describing unit tobe allocated at a particular location with a storage device. The storagedevice is foreign to the relationship between a block it hosts and acorresponding file managed by the virtual file system. In someembodiments, the VFS retrieval module 186 identifies one or more blocksassociated with a particular file and then passes information about theidentified blocks to a storage device management module 192 that isresponsible for fetching the corresponding blocks from the storagedevices 196. In some embodiments, the storage device management module192 includes multiple sub-modules, each sub-module configured tocommunicate with one or more particular storage devices forfetching/pushing blocks from/to the corresponding storage devices.

In some embodiments, the virtual file system supports one block to beshared among multiple revisions of the same file or even multiple filesregardless of whether they are related or not. This feature iseffectively a delta compression scheme to reduce the storage usage andimprove the network usage. The core engine 170 includes a block locationmanagement module 190 for implementing this feature by generating andmanaging a mapping between the virtual file system and the storagedevices 196. For each block to be fetched or pushed, the storage devicemanagement module 192 (or one of its sub-modules) queries the blocklocation management module 190 for a corresponding location of the blockat a particular storage device and then performs the specificoperations. In some embodiments, the block location management module190 may be part of the storage device management module 192.

To be a reliable file management system, the virtual file system has thenecessary capacity of recovering the files lost due to an unexpected,but maybe fatal, system failure or network breakdown. For example, inresponse to the instructions from the VFS retrieval module 186, thestorage device management module 192 attempts to fetch a block from aspecific storage device but receives an error message indicating thatthe requested block is no longer stored at the storage device. In thiscase, the VFS retrieval module may instruct the VFS recovery module 188to recover the lost block to satisfy the client request. In someembodiments, the virtual file system implements a parity-based datarecovery scheme before its pushes any block to the storage cloud. Uponreceipt of the block recovery instruction, the VFS recovery module 188checks if it has access to the blocks necessary for reconstructing thelost block according to the parity-based data recovery scheme. If thereis any block not available, the VFS recovery module 188 may notify thestorage device management module 192 to retrieve the block from arespective storage device. After collecting all the blocks, the VFSrecovery module 188 then performs the parity-based data recovery schemeto rebuild the missing block and returns the rebuilt block to the VFSretrieval module 186 for performing the subsequent processing steps inconnection with the client request.

As noted above, a file is stored at the storage devices in the form ofone or more blocks. Because of the file-to-block conversion, after thevirtual file system retrieves all the blocks associated with a file fromthe storage devices, it cannot serve the blocks to a requesting clientdirectly. Instead, the virtual file system needs to perform ablock-to-file conversion to rebuild the client requested file.Throughout this application, the file-to-block conversion may also bereferred to as “block packing” or “pack a block” while the block-to-fileconversion as “block unpacking” or “unpack a block.”

Referring again to FIG. 1D, the VFS management module 184 or the VFSretrieval module 186 forwards the blocks retrieved from the storagedevices to the VFS cache module 174. The VFS cache module 174 isresponsible for unpacking these blocks and rebuilding the clientrequested file. In some embodiments, the retrieved blocks are initiallystored at the volume data repository 182. The VFS cache module 174unpacks the blocks into one or more data blocks containing the contentof a file and one or more objects containing the metadata of the file.In some embodiments, the data blocks are stored at the block store 178and the metadata objects are stored at the object store 180. In someembodiments, the block store 178, the object store 180, and the volumedata repository 182 are merged into one data structure entity. Forillustration, they are abstracted into three entities throughout thespecification. Using the data blocks and the metadata objects, the VFScache module 174 rebuilds the client-requested file and stores the filein the local file system cache 176. In addition, the VFS cache module174 also builds a hierarchical tree structure referencing the file usingthe metadata objects associated with the file and updates the treestructure to keep track of all subsequent operations to the file.Throughout this application, this tree structure is sometimes referredto as “commit tree” or “commit graph.”

In some embodiments, the data traffic across the core engine 170 isbi-directional. As described above, data may come from the store devices196, pass through the core engine 170, and arrive at the client 194.This is the process of the virtual file system retrieving data from thestore devices to satisfy a file request from the client 194. Conversely,data may come from the client 194, pass through the core engine 170, andreach the storage devices 196. This is the process of the virtual filesystem synchronizing data with the store devices. During this datasynchronization process, the VFS cache module 174 is responsible forbreaking a file into the data blocks in the block store 178 and metadataobjects in the object store 180 and then packing them into a set ofblocks stored in the volume data repository. Through the VFS retrievalmodule 186 and the storage device management module 192, the VFS cachemodule 174 pushes the blocks into the storage devices 196.

In some embodiments, a user may have chosen a data redundancy policy forthe virtual file system in order to protect against potential data loss.In this case, the VFS replication module 175 receives the dataredundancy policy from the VFS configuration module 172 and implementsthe policies on the blocks stored at the volume data repository beforethey are being pushed into the storage devices 196.

Although the core engine 170 is described above in connection with afile request and the processing of a corresponding file, those skilledin the art would understand that the same methodology is applicable tothe virtual file system in its entirety or any portion thereof. Notethat the arrangement of the components in the core engine 170 as shownin FIG. 1D is for illustrative purpose. It would be apparent to thoseskilled in the art that two or more modules can be merged into onemodule that performs the equivalent functions or one module can bedivided into multiple sub-modules for performing the equivalentfunctions. For example, the storage device management module 192 canincorporate the block location management module 190. Moreover, theconnections of the components in the core engine 170 are used forillustrating some interactions between these modules. It would beapparent to those skilled in the art that there are interactions betweentwo modules that are not shown in FIG. 1D. For example, the VFSmanagement module 184 may be able to access the local file system cache176 directly in order to determine whether a client-request file isavailable at the local file system cache 176.

FIGS. 2A to 2F are block diagrams illustrating data structures inassociation with a virtual file system according to some embodiments.These data structures are used by the components within the core engine170 to support the operations described above in connection with FIG.1D. In particular, the data structures such as the file node 201, thedirectory node, 203, and the commit node 205 correspond to therespective components of the commit tree created by the core engine 170for tracking the state of each file and directory associated with thevirtual file system. An exemplary commit tree and its evolution aredescribed below in connection with FIGS. 6A to 6D. As will be explainedbelow in detail, the core engine 170 uses the commit tree structure torecord the operations performed on the virtual file system so that itcan easily locate a client-requested file or directory or aclient-requested revision of a file or directory.

FIG. 2A depicts an exemplary data structure of a file node 201, which islocated at a leaf of the commit tree. The file node 201 includes a filenode ID 201-1, a reference to a predecessor file node 201-3, one or morefile attributes 201-5, a block catalog 201-7, and a reference to thefile at the local file system cache 201-9. In some embodiments, the filenode ID 201-1 is generated by applying a hash algorithm (e.g., SHA-256)to the other metadata entries in the file node 201 to ensure itsuniqueness. The reference to a predecessor file node 201-3 is used foridentifying a previous revision of the same file. Using this parameter,the core engine can locate a client-requested revision of a file fromits current revision by traversing across the commit tree. Note thatthis entry is empty if the file node 201 identifies the initial versionof a file. The file attributes 201-5 may include its name and size, itscurrent state, its revision timestamp, its content hash, etc. In someembodiments, a file in the virtual file system may switch between thefollowing states:

-   -   local—the file only exists at the local file system, not at the        storage cloud;    -   modified—the file at the local file system has been modified,        but not yet synchronized with the storage cloud; and    -   cloud—the file at the local file system has been synchronized        with the storage cloud.

In some embodiments, the block catalog includes a set of block IDs, eachblock ID identifying a respective content block within the storagecloud. As will be explained below, the use of block catalog to identifya file's content makes it possible for different revisions of the samefile to share the same set of content blocks that have not been changedbetween these revisions while each revision has a respective set ofcontent blocks that represents the changes to the content made by thatrevision. In some embodiments, this block sharing concept is expandedfrom different revisions of the same file to different files to increasethe storage clouds efficiency. In some embodiments, the reference to thefile at the local file system cache 201-9 is optional. This entry may befilled if there is a local copy of the file at the local file systemcache, which was generated by the virtual file system in response toeither a request for retrieving the file from a remote storage device ora request for uploading the file into the virtual file system. In someembodiments, this entry is implied by the relative path of the file inthe local file system cache 176.

Note that not all the attributes shown in FIG. 2A are necessarycomponents of the file node's data structure. This observation alsoapplies to the directory node and the commit node described below. Insome embodiments, the file node ID 201-1 is not kept in the file node201 because it can be dynamically generated from the other components ofthe data structure. In some embodiments, the file node may include thestorage policy associated with the file. By having the storage policy,the virtual file system supports the file-level configuration of itsstorage strategy. This observation also applies to the directory nodeand the commit node described below such that the directory-levelstorage policy applies to all the files and child directories that donot have their respective storage policies and the commit-level storagepolicy applies to all the associated files and directories that do nothave their own storage policy. In other words, a lower-level tree node'sstorage policy can overwrite a higher-level tree node's storage policy.

FIG. 2B depicts an exemplary data structure of a directory node 203,which is an intermediate node of the commit tree (unless it correspondsto an empty directory in the virtual file system). The directory node203 includes a directory node ID 203-1, a reference to a predecessordirectory node 203-3, a reference to a child directory node 203-5, areference to file node 203-7, one or more directory attributes 203-9,and a reference to a directory in the local file system cache 203-11. Insome embodiments, the directory node ID 203-1 is generated by applying ahash algorithm (e.g., SHA-256) to the other metadata entries in thedirectory node 203. The reference to a predecessor directory node 203-3is used for identifying a previous revision of the same directory. Usingthis parameter, the core engine can locate a client-requested revisionof a directory from its current revision by traversing across the committree. Note that this entry is empty if the directory node 203corresponds to the initial version of a directory. The reference to achild directory node 203-5 is used for traversing from the currentdirectory level to the next directory level down the hierarchicalstructure of the commit tree. Similarly, the reference to a file node203-7 is used for reaching a particular file node associated with thecurrent directory in the virtual file system. The members of thedirectory attributes 203-9 are similar to that of the file attributes201-7 except that they are associated with a particular directory, not afile. In some embodiments, the reference to the directory at the localfile system cache 203-11 is optional. This entry may be filled if thereis a local copy of the directory at the local file system cache, whichwas generated by the virtual file system in response to either a requestfor retrieving the directory from a remote storage device or a requestfor uploading the directory into the virtual file system.

FIG. 2C depicts an exemplary data structure of a commit node 205, whichis the root node of a particular branch of the commit tree. In someembodiments, each tree branch points to a particular revision of theroot directory of the virtual file system. The commit node 205 includesa commit node ID 205-1, a reference to a predecessor commit node 205-3,a reference to the root directory node 205-5, one or more configurationparameters 205-7, and a commit description 205-9. In some embodiments,the commit node ID 205-1 is generated by applying a hash algorithm(e.g., SHA-256) to the other metadata entries in the commit node 205 ina predefined order. The reference to a predecessor commit node 205-3 isused for identifying a previous commit tree branch of the same virtualfile system. Using this parameter, the core engine can locate aclient-requested commit tree branch from the current commit tree branchby traversing across the commit tree. Note that this entry is empty ifthe commit node 205 corresponds to the initial commit tree branch of thevirtual file system. The reference to the root directory node 205-5 isused for traversing down through the current commit tree branch. Theconfiguration parameters 205-7 is used for specifying the commit-leveldata redundancy policies as well as the encryption/decryption keys usedfor protecting the files associated with the commit node 205. In someembodiments, the commit node 205 includes a textual description tocharacterize the unique aspects of the commit branch, e.g., whatfile/directory modifications have been applied to the files anddirectories associated with the commit node 205.

As noted above, the storage device management module 192, in connectionwith the block location management module 190, determines the identityof a storage device and a specific location at the storage device fromwhich a block should be fetched or to which a block should be pushed.Because different types of storage devices may work differently, thecore engine may have an interfacing module, which is also known as an“adapter,” for a particular type of storage devices in order to storeblocks in the storage devices. For example, one adapter is used forcommunicating with storage devices that can be attached to a computerand become part of the computer's file system, such as hard drive, thumbdrive, network attached storage, etc. Another adapter is used forcommunicating with a cloud storage service provider by a third partysuch as Amazon S3. In some embodiments, an adapter is configured toperform at least three operations: (i) retrieving a metadata or contentblock for a given block ID within a particular data volume; (ii) storingwithin a storage device a metadata or content block identified by ablock ID within a particular data volume; and (iii) returning a set ofblock IDs for a given data volume, which may include the location ofeach block at a respective storage device. One skilled in the art candevelop an adapter for another type of storage devices or services orother operations beyond the three operations based the teachings herein.

In some embodiments, an adapter is responsible for performing a costanalysis for a particular type of storage devices based on the technicalspecification of the storage devices. The storage device managementmodule 192 may compare the cost analyses provided by different adapterswith the virtual file system's configuration policy to determine whichtype of storage services should be used for a particular storage task.The virtual file system may perform this procedure whenever a new typeof storage device is attached to the virtual file system or after apredefined time period. By doing so, the virtual file system canoptimize the resources offered by different storage devices and providemore efficient services to users of the virtual file system.

FIG. 2D depicts an exemplary adapter/block map 207, which is a datastructure used by the block location management module 190 for trackingthe locations of the blocks stored at the different storage devices. Insome embodiments, the adapter/block map 207 is also referred to as an“object-storage mapping table.” The adapter/block map 207 has a uniqueadapter ID 207-1. The virtual file system may have one or more datavolumes (207-3, 207-9) using the same adapter associated with theadapter/block map 207. For each data volume, there is a unique volume ID207-5 and a set of (block ID, block location, storage tier level)triplets 207-7. In some embodiments, the storage tier level indicatesthe cost of having a block stored in a respective storage deviceassociated with the adapter 207. In some other embodiments, a specificcost value is provided in the adapter/block map to indicate the cost forallocating blocks at a particular storage device or devices. When thevirtual file system decides to use the adapter associated with theadapter/block map 207 to store a set of blocks associated with aparticular data volume at the corresponding storage devices, a new setof entries like 207-3 to 207-7 will be inserted into the adapter/blockmap 207. In some embodiments, the virtual file system may choose two ormore adapters for storing different subsets of blocks associated withthe same data volume. This may happen if the different subsets of blockscorrespond to different types of files that the virtual file system hasdifferent configurations. In some embodiments, the same data volume maybe stored at multiple locations based on the virtual file system's dataredundancy policy. In some embodiments, the virtual file system convertsthe adapter/block map 207 into an object and saves the object in a blockat a predefined location in a particular storage device. In someembodiments, the blocks associated with a particular adapter 207 are notorganized into different data volumes. In this case, the volume ID 207-5is not needed in the adapter/block map. By doing so, a block may beshared by multiple volumes if the blocks block ID appears in multipledata volumes' metadata as described below in connection with FIG. 2F.

The data structures described above are used by the core engine forstoring the metadata associated with virtual file system. Based on thismetadata information, the core engine can rebuild the virtual filesystem at a computing device, including rebuilding different types offiles within a respective data volume by retrieving the correspondingblocks from the respective storage devices. Although the files within adata volume may have different formats such as text, binary, picture,multimedia (video or audio) that may be handled by different softwareapplications, they are treated in substantially the same way by thestorage devices as a bag of blocks. But for illustrative purpose, adichotomy of the blocks is sometimes employed when naming the blockssuch that a block that stores metadata of a file (e.g., a file node) isreferred to as a “metadata block” and a block that stores a portion ofcontent of the file is sometimes referred to as a “content block” or“data block.”

FIG. 2E depicts an exemplary data structure for a block 209. The block209 is divided into header 209-1 and block data 209-11. Note that from afile's perspective, the block data 209-11 may be the file's metadata(i.e., the block 209 is a metadata block) or the file's content (i.e.,the block 209 is a content block). The block's header 209-1 furtherincludes a block ID 209-3, a tag 209-5, a reference 209-7 to the nextblock associated with the same file, a reference 209-9 to the datavolume that includes the file, and key information 209-10. In someembodiments, the block ID 209-3 is generated by applying a hashalgorithm (e.g., SHA-256) to the block data 209-11. The tag 209-5 isused by the core engine for determining how the block was packed andchoosing a corresponding application program interface module forreading the unpacked block in connection with rebuilding the file. Insome embodiments, the key information 209-10 is a hash of the block data209-11. The core engine can compare the retrieved block data 209-11 withthe key information 209-10 by recalculating the key information usingthe block data 209-10. A no match indicates that the block data may havebeen corrupted such that the core engine may need to retrieve the sameblock from another storage device or recover the block using otheroptions such as parity data. In some embodiments, the block data 209-11is encrypted using a key when the core engine packs the block. As such,the core engine needs to have the same key for decrypting the block datawhen unpacking the block data back to the original data block or object.By encrypting the block data before it is being stored at a storagedevice managed by a third party, the virtual file system can help toimprove the security protection for the data from potential maliciousattacks.

In some embodiments, a block stored in the storage devices does notreference other blocks or the data volume with which it is associated.The block's metadata is self-descriptive, e.g., using acontent-addressable block ID to characterize what block data it hasgenerated. The content-address block ID, as noted above, is generated byapplying a hashing algorithm to the block data. The core enginegenerates a data volume data structure including the identifiers of theblocks associated with a particular data volume and a sequence ofprocessing the blocks when rebuilding the files associated with the datavolume.

FIG. 2F depicts an exemplary data structure for a data volume 211. Thedata volume 211 includes a volume ID 211-1 and a block array 211-3. Foreach metadata or content block, there is a corresponding entry 211-5 inthe block array 211-3. Each entry 211-5 in the block array 211-3 has twoattributes, block ID and block index. The block ID is acontent-addressable parameter that uniquely identifies a blockassociated with the data volume and the block index is a parameter thatrepresents the relative position of the block with respect to the otherblocks in the data volume. For example, as will be explained below, thecore engine converts a node on the commit tree into an object in theobject store and packs the object into a metadata block the volume datarepository. The metadata block has a corresponding entry in the datavolume structure 211 whose block ID is determined by the metadataassociated with the node and whose block index is, by default, zerobecause the node has references to the other nodes in the commit tree.But an entry corresponding to a content block or data block has both theblock ID that corresponds to the content of the block and a non-zeroblock index that represents the block's relative location, whichinformation is used for rebuilding the file. For example, if a file hasfive data blocks, their respective block indexes in the data volume 211could be 0, 1, 2, 3, and 4.

In some embodiments, the virtual file system manages multiple datavolumes at the same time. For each data volume, the virtual file systemmay generate a data structure like the one shown in FIG. 2F. At startup,the virtual file system uses the data structures associated withdifferent data volumes to rebuild them on. a computing device. In someembodiments, the virtual file system converts the data structure for adata volume into an object and saves the object in a metadata block at apredefined location in a particular storage device. Because of theself-containing nature of the metadata and content blocks associatedwith a data volume, a user can rebuild the data volume at any computingdevice with a preinstalled virtual file system even if the virtual filesystem has no information about the data volume. Thus, different userscan share their data volumes with each other.

The description above provides an overview of how the virtual filesystem (in particular, the core engine) operates in response to clientinstructions and multiple exemplary data structures used for supportingthe operations. The description below, in connection with the figuresstarting with FIG. 3A and ending with FIG. 5C, focuses on how thevirtual file system performs specific operations in connection with theclient requests.

FIG. 3A is a flow chart illustrating a process of initializing a virtualfile system at a computing device according to some embodiments. Thisinitialization is a precondition for the virtual file system to performany other operations. As one of the first steps of system startup, thevirtual file system identifies one or more storage devices associatedwith the virtual file system (301). In some embodiments, the virtualfile system selects a location (e.g., a directory in the local filesystem) in the local hard drive as the default storage device (e.g.,tier-1 storage device) for storing the blocks. Subsequently, a user ofthe virtual file system can add more storage devices to the virtual filesystem, which may be a thumb drive connected to a USB port of thecomputing device or a remote cloud storage service that can be accessedthrough the Internet. In some embodiments, each newly-added storagedevice is included in the configuration file of the virtual file systemso that the virtual file system can reconnect to each of the storagedevices at the startup.

From the associated storage devices, the virtual file system retrieves aset of blocks (303). In some embodiments, the virtual file system onlyretrieves a minimum number of blocks that are required to initialize thevirtual file system such as the metadata blocks used for generating thecommit tree, the metadata blocks associated with the adapter/block map,and the metadata and content blocks associated with the files that arerequired to be present at the computing device, e.g., per the virtualfile system's configuration. Blocks associated with the other files canbe retrieved subsequently in an on-demand fashion. In some embodiments,the virtual file system may manage multiple data volumes. Each datavolume corresponds to a particular file system hierarchical structurethat includes one or more directories and one or more files associatedwith respective directories. As described above in connection with FIGS.2D to 2F, the virtual file system may retrieve the metadata and contentblocks associated with different data volumes from the storage devicesattached to the virtual file system.

Using the retrieved blocks, the virtual file system builds theadapter/block map (305). In some embodiments, the virtual file systemunpacks the metadata block associated with the adapter/block map todetermine the responsibilities of each adapter such as managing a set ofblocks associated with a data volume, retrieving a block from a storagedevice or pushing a block to a storage device, and updating theadapter/block map accordingly.

Using the retrieved blocks, the virtual file system renders a committree for the virtual file system (307). To do so, the virtual filesystem identifies all the metadata blocks associated with the committree, extracts the metadata associated with the file/directory/commitnodes from the blocks, and builds a hierarchical tree structure thatlinks the tree nodes with a respective file/directory/commit node. Insome embodiments, the virtual file system assembles the commit tree in atop-down fashion, starting with the commit nodes and ending with everyfile node or directory node. At the end of the process, the virtual filesystem chooses one of the commit nodes (if there are multiple commitsassociated with the virtual file system) as the current commit by havinga head node pointing to the current commit node. As will be explainedbelow, the commit tree is similar to a snapshot of the virtual filesystem at a particular moment, which tracks every update to the virtualfile system. The commit tree rendered at the system startup not only hasthe current status of the virtual file system (e.g., the revisiontimestamps of every file and directory associated with the virtual filesystem and their associated content) but also provides a mechanism for auser to arbitrarily revert to a previous revision for the same file ordirectory. This mechanism is built into the commit tree because eachtree node data structure (file node 201, directory node 203, or commitnode 205) has a reference to its predecessor tree node.

For example, assume that a virtual file system's commit tree has threebranches, each branch beginning with a respective commit node (C1, C2,C3) and ending with a respective file node (F1, F2, F3), and that eachof the three tree branches is added to the commit tree when there is anupdate to a file associated with the three file nodes. In other words,the three tree branches represent three revisions to the file. As notedabove, the file node data structure 201 includes a reference 201-3 toits predecessor file node such that F3 references F2 and F2 referencesF1. Therefore, a user who reaches the file node F3 by traversing downthe tree branch containing F3 can revert to any of the two previousrevisions to the file by traversing laterally from the file node F3 tothe file node F2 and then the file node F 1. As depicted in FIGS. 2B and2C, this mechanism is applicable to the lateral traversal of directory(or commit) nodes. A more detailed example is provided below inconnection with FIGS. 6A to 6D.

As noted above, one virtual file system may manage multiple data volumesat the same time. In some embodiments, the virtual file system buildsone commit tree for each data volume such that different data volumeshave different commit trees. In some other embodiments, the virtual filesystem builds one commit tree for all the data volumes. Other than thisdifference, the two approaches share substantially the same underlyingmechanism. For simplicity, the description below assumes that there isone commit tree for the virtual file system unless otherwise specified.

Next, the virtual file system traverses the commit tree to build a localfile system in the local file system cache in accordance with thevirtual file system's configuration (309). In some embodiments, thevirtual file system (or the core engine, to be specific) first retrievesblocks associated with those files that are required to be found in thelocal file system and then unpacks them to rebuild each file in thelocal file system cache. Note that the file rebuilding process may failif the virtual file system is unable to retrieve at least one of theblocks necessary for the file rebuilding. As will be explained below inconnection with FIG. 4A, the virtual file system sometimes may have toleverage on the data redundancy scheme built into the data volume torecalculate the missing blocks if necessary.

One function of the virtual file system is to provide a user-requestedfile in response to a user request. FIG. 3B is a flow chart illustratinga process of a virtual file system returning a file in response to afile request from a user (e.g., an application) at a computing deviceaccording to some embodiments. Note that the process described hereinapplies to a user request for a directory of the virtual file system.

In response to a request from an application or a person for a filemanaged by the virtual file system (311), the core engine performs alookup of the commit tree for a file node associated with the requestedfile (313). In some embodiments, the request may specify a particularrevision or multiple revisions to the file for retrieval, which may ormay not include the latest revision of the file. By default, the requestis for the latest revision of the file. In some embodiments, the requestis a user click on an icon that has an associated file node ID. Afterreceiving the request, the core engine identifies the file node ID andthen checks the state of the file node using the identified file node ID(315).

If the file's state is one of the following two states: local ormodified, there is a valid, local copy of the requested file at thelocal file system cache. In some embodiments, the state “modified” mayinclude the addition of a new file to the virtual file system. Thevirtual file system identifies the location of the requested file in thelocal file system cache by querying the file node and then fetches thefile from the local file system cache (317-A). In some embodiments, thevirtual file system accesses the file using a relative path associatedwith the file in the local file system cache. Otherwise, the virtualfile system initiates an operation to fetch the file from a storagedevice within the storage cloud (317-B). A detailed description of theoperation is provided below in connection with FIG. 3C. In either case,the virtual file system returns the requested file to the requestingapplication or user (319).

FIG. 3C is a flow chart illustrating a process of a virtual file systemretrieving a set of data blocks associated with a file from a storagedevice according to some embodiments. As noted above, there is aninter-node referencing mechanism that links the file/directory/commitnodes together into a commit tree like the one shown in FIG. 6D. Amongthe three types of node, the commit node is at the root of the committree, pointing to the root directory node, and each file nodecorresponds to a leaf node of the commit tree while all the directorynodes are located in-between the commit node and the file nodes (notethat an empty directory node may correspond to a leaf node). A committree may have multiple tree branches, each branch corresponding to arespective revision of a file in the virtual file system.

The core engine identifies a first file node referencing the latestrevision of the file in the commit tree (321). Note that the first filenode may or may not be part of the last commit tree branch depending onwhat causes the commit of the latest tree branch. From the first filenode, the core engine traverses laterally to a second file node in thecommit tree (but part of a different tree branch) that references aspecific revision of the file (323). Assuming that the specific revisionis the client-requested one, the core engine extracts the block catalogfrom the second file node and identifies a set of blocks associated withthe second file node (325). As explained above, an adapter is thenselected (327) for retrieving the blocks from the corresponding storagedevices (329). In some embodiments, the core engine may choose multipleadapters for retrieving the blocks from the corresponding storagedevices by assigning a subset of the blocks to a respective adapter in apredefined manner (e.g., randomly) so as to balance the load atdifferent storage devices and improve the virtual file system'sefficiency.

In some embodiments, a specific revision of a file may be retrieved bysynchronizing the commit tree of the virtual file system to a particularrevision of the commit tree revision that includes the specific revisionof the file. Because the entire commit tree has been synchronized, thefile of choice is also automatically synchronized to the user-desiredrevision.

FIG. 3D is a flow chart illustrating a process of a virtual file systemgenerating metadata for a new revision of a file and synchronizing thenew revision with a storage device according to some embodiments. Insome embodiments, this process may be triggered in response to a clientrequest to add a new file to the virtual file system or update anexisting file in the virtual file system. As noted above, the virtualfile system typically adds a new branch to the commit tree to log thetransactions happening to the virtual file system between the lastcommit and the current commit. In some embodiments, the core enginegenerates the new tree branch in a bottom-up fashion. For example, thecore engine first generates a new file node referencing one or moreblocks associated with the new/updated file (331). Next, the core enginegenerates a new directory node referencing the new file node (333) andgenerates a new commit node referencing the new directory node (335). Insome embodiments, the core engine may iterate the two steps multipletimes if there are multiple directory layers separating the file nodefrom the commit node. The generation of the new commit node implies thecreation of the new commit tree branch. The core engine then adds thenew commit tree branch to the commit tree (337). Finally, the coreengine synchronizes the virtual file system with the storage devices bypushing the new blocks associated with the file node to the respectivestorage devices (339).

FIG. 3E is a flow chart illustrating a process of a virtual file systemgenerating metadata for a deletion of a file and synchronizing thedeletion with a storage device according to some embodiments. In someembodiments, the deletion of a particular revision of a file is treatedlike updating an existing file by adding a new tree branch to the committree. For example, the core engine identifies a directory nodereferencing a file node that references a specific revision of a file inthe virtual file system (341). Next, the core engine generates a newdirectory node based on the identified directory node such that the newdirectory node no longer references the file node associated with thefile revision of be deleted (343). The core engine then generates a newcommit node referencing the new directory node (345). In someembodiments, the core engine may iterate the two steps multiple times ifthere are multiple directory layers separating the file node from thecommit node. The generation of the new commit node implies the creationof the new commit tree branch. The core engine then adds the new committree branch to the commit tree (347). Finally, the core enginesynchronizes the virtual file system with the storage devices by pushingthe new blocks associated with the file node to the respective storagedevices (349).

FIG. 4A is a flow chart illustrating a process of a virtual file systemprocessing metadata blocks and content blocks retrieved from a storagedevice according to some embodiments. As described above in connectionwith FIG. 3C (e.g., step 329), the core engine turns a request for afile or even a data volume into requests for a set of metadata andcontent blocks and then fetches each block from a respective hostingstorage device using an appropriate adapter. The process depicted inFIG. 4A provides more details of how the core engine processes each treenode in connection with retrieving a client-requested file.

Assuming that the core engine reaches a tree node in a commit treebranch (401), the core engine first checks whether the correspondingvolume data is present in the volume data repository or not (403). Ifnot (403—No), the core engine identifies the tree node ID of the treenode and optionally a block catalog (if the tree node is a file node)and then fetches the corresponding metadata blocks and content blocksfrom the respective storage devices (405). At a predefined moment (e.g.,after fetching one block), the core engine checks if all the blocks havebeen fetched or not (407). In some embodiments, there are multiplecopies of a block at different storage devices, each copy having arespective priority defined by the core engine. The core engine startswith an adapter for retrieving a block from a respective storage devicewith the highest priority. If this attempt fails, the core engine maychoose another adapter for retrieving a block from a respective storagedevice with the next highest priority. In some embodiments, if the coreengine determines that it cannot finish the block fetching after apredefine time period (407—No), the core engine may need to recover themissing data using either the parity data or a data at a mirroring site(409). A more detailed description of the missing data recovery processis provided below in connection with FIG. 4B.

If the block fetching is completed (407—Yes) or if all the blocks arealready present (403—Yes), the core engine then unpacks the volume data(e.g., the metadata and content blocks) to extract one or more dataitems from each block (411). In some embodiments, a data item within ametadata block could be a file node object, a directory node object, ora commit node object and a data item within a content block is a file'sentire or partial content. As described above in connection with FIG.2E, each block has an associated tag for unpacking the block. In someembodiments, the data within a block has been serialized when the blockwas generated. The unpacking of the block is a process that deserializesthe block so that the core engine can access each entry within aparticular data structure as shown in FIGS. 2A to 2C.

For each extracted item, the core engine checks whether it is metadataor file content (415). If the data item is part of a file's content(415—No), the core engine puts the item into the block store (419),which will be combined with the other contents of the same file to forma copy of the file in the local file system cache. Otherwise (415—Yes),the extracted item is an object corresponding to one of a file node, adirectory node, or a commit node, which is moved by the core engine intothe object store (417). The core engine then determines whether theobject is a directory node object or a file node object (421). If theobject corresponds to a directory node, the core engine recursivelyenumerates the directory node's child nodes and processes each childnode (423—B), which may be another directory node or a file node. Notethat the enumeration of the child nodes is basically a repetition of theprocess against each child node as described above in connection withsteps 401 to 421. If the object corresponds to a file node, the coreengine then extracts the block IDs from the file nodes' block catalogand processes each block ID in a substantially similar manner (423

A), e.g., checking if the content block is present in the volume datarepository, fetching the block from a storage device, unpacking thecontent block.

As describe above in connection with FIG. 4A, when the core engine isunable to fetch a block in connection with a client request, it willhave to recover the missing block using other information under itscontrol. In some embodiments, the core engine recovers the missing datausing pre-computed parity data. FIG. 4B is a flow chart illustrating aprocess of a virtual file system computing missing metadata or contentusing the pre-computed parity data retrieved from a storage deviceaccording to some embodiments. A more detailed description of the paritydata are computed is provided below in connection with FIG. 5C.

To recover the missing data, the core engine retrieves the parity blocksassociated with the missing blocks from the storage devices (431). Ifthe core engine is unable to retrieve the parity blocks (433—No), theattempt to recover the missing blocks fails and the core engine notifiesthe client that it is unable to provide the requested file. Otherwise(433—Yes), the core engine selects the parity datum to rebuild themissing blocks (435). In some embodiments, a file node's block catalognot only has the block IDs of the content blocks of a file but alsostores the block IDs of the parity blocks of the same file. In someembodiments, there is a parity data volume at one or more storages forstoring parity data associated with the virtual file system. As will beexplained below in connection with FIG. 5C, each parity block isgenerated by processing the other file-related blocks that may be partof the same file or from other files. Given the block ID of a missingblock, the core engine checks whether it has all the componentsnecessary for rebuilding the missing block (437). If not (437—No), thecore engine may identify the missing component (i.e., another block) andfetch the missing component from the storage devices (439). If all themissing components are retrieved (441—Yes), the core engine can computethe original missing block using the parity blocks (447). If at leastone of the missing blocks is not retrieved (441—No), the core engine mayrecursively fetch the missing components (443) until either all thecomponents required for generating a missing block are found (445—Yes)or the core engine stops the recursive process due to its failure tofetch at least one missing component (445—No) or other possible reasons(e.g., the client request for the file has not been satisfied for morethan a predefined time period).

As noted above, the virtual file system synchronizes with the storagedevices in the storage cloud either periodically or in an on-demandfashion. By doing so, the virtual file system not only provides betterprotection for the files stored therein but also expands its managementcapacity beyond the capacity of the computing device on which thevirtual file system runs. In addition, this procedure ensures that thesame user or another user may be able to access the file updated at thecomputing device.

FIG. 5A is a flow chart illustrating a process of a virtual file systemsynchronizing metadata and data with a storage device according to someembodiments. Upon receiving a request to synchronize the virtual filesystem with one or more storage devices (501), the core engine traversesthe commit tree to generate set of blocks associated with the virtualfile system (503). In some embodiments, this traversal process happensat the data volume level such that there is a set of blocks associatedwith each individual data volume. Before pushing the blocks to thestorage devices, the core engine may compute parity data for the blocksto ensure that some missing blocks may be recovered using the paritydata (505). In some embodiments, the core engine may also encrypt theblocks to prevent authorized access if the blocks are stored in a cloudstorage service provided by a third party. In some embodiments, based onthe virtual file system's configuration policy, the core engine selectsone or more adapters for the set of blocks to be pushed over to thestorage cloud (507). As noted above, an adapter is usually responsiblefor dealing with a specific type of storage devices. Multiple adaptersmay be necessary if the set of blocks should be partitioned intomultiple subsets and stored at different storage devices. Finally, thecore engine pushes each subset of the blocks to a corresponding storagedevice or devices using the chosen adapter (509).

In sum, the synchronization of a virtual file system with the storagecloud is a process of converting the virtual file system into aplurality of metadata or content blocks and pushing each block into arespective storage device. In some embodiments, the conversion is toserialize each tree node into an object in the object store and a pieceof file content into a content block in the block store. FIG. 5B is aflow chart illustrating a process of a virtual file system serializingmetadata and data to be synchronized with a storage device according tosome embodiments.

Unlike the process of updating the commit tree in connection with theaddition or modification to the virtual file system, which is performedin a bottom-up fashion, the synchronization of the virtual file systemproceeds in a top-down manner that starts with the commit node of thecommit tree as the first tree node (511).

For each tree node, the core engine serializes the tree node's metadatainto an object in a predefined order so that, when the core engineunpacks the object, it understands what metadata corresponds to whatbytes of the object (513). Next, the core engine generates an object IDfor the serialized metadata using, e.g., SHA-256 (515). In someembodiments, the object ID is a content-based address so that onecontent block may be shared by multiple files. The core engine storesthe object ID and the serialized data into an object in the object store(517) and inserts an entry for the object into the data volume'sassociated block array (519).

The core engine then checks whether the tree node being processed is adirectory node or file node (521). If it is a directory node, the coreengine selects one of the child directory nodes (523) as the currenttree node and returns to process the child directory starting from step513. If it is a file node, the core engine then partitions the file'scontent into one or more content blocks (525). In some embodiments, theblock size may be determined by a policy in the virtual file system'sconfiguration such that different files have different block sizes. Forexample, for immutable contents such as video or picture, the policy mayspecify the entire file as one block; and for those mutable contentssuch as a MS-Word document, the policy may specify a smaller block sizeto make the delta compression more efficient.

For each piece of file content, the core engine generates a block IDusing, e.g., the SHA-256 (527), stores the block ID and the content intoa content block in the block store (529), and inserts an entry for thecontent block into the data volume's block array (531). As noted abovein connection with FIG. 2F, an entry in the block array for a contentblock includes a block ID and a block index. The block index identifiesthe content's relative position in the file.

Finally, the core engine determines whether it has processed the lasttree node (533). If not (533—No), the core engine then selects nextdirectory or file node as the current tree node (535) and returns toprocess the child directory starting from step 513. Otherwise (533—Yes),the core engine has synchronized the entire virtual file systemrepresented by the commit tree.

As noted above, data redundancy is used for protecting the virtual filesystem from potential data unavailability at a particular storagedevice. In some embodiments, the data redundancy is implemented at ablock level. One scheme is to duplicate a block and store the duplicatedblock at a particular storage device. In some embodiments, this schememay be more appropriate for metadata blocks. Another scheme is tocompute parity data blocks from the existing blocks and stored theparity data blocks at a particular store device. This scheme is oftenused for content blocks. Two particular embodiments of the second schemeare disclosed below in connection with FIGS. 5C and 5D.

FIG. 5C is a block diagram illustrating an intra-file parity computationscheme according to some embodiments. For example, a file D ispartitioned into content blocks D₁, D₂, . . . , D_(N). According to thisscheme, the parity blocks are computed from a group of neighboringcontent blocks associated with the same file (for this purpose, the lastblock D_(N) is deemed to be next to the first block D₁ to form a loop),e.g.,

P _(1,2,3) =D ₁ ⊕D ₂ ⊕D ₃

P _(2,3,4) =D ₂ ⊕D ₃ ⊕D ₄

. . .

P _(N-1,N,1) =D _(N-1) ⊕D _(N) ⊕D ₁

P _(N,1,2) =D _(N) ⊕D ₁ ⊕D ₂

Note that the content block D_(I) contributes to the computation ofthree parity blocks, P_(1,2,3), P_(N-1,N,1), and P_(N,1,2). Therefore,if the content block D₁ is missing, the core engine can rebuild thecontent block D1 by retrieving the three parity blocks and performingthe following operation:

D ₁ =P _(1,2,3) ⊕P _(N-1,N,1) ⊕P _(N,1,2)

FIG. 5D is a block diagram illustrating an inter-file parity computationscheme according to some embodiments. In this example, the file D is thetarget file that requires parity protection. Files A and B are two filesthat readily accessible to a user and they are not necessarily part ofthe virtual file system. In order to compute the parity blocks, both thefiles A and B are partitioned into multiple blocks and different blocksfrom different files are combined by “the exclusive or” operation togenerate the parity blocks that are part of the parity file P, e.g.,

P ₃ =A ₄ ⊕B ₄ ⊕D ₁

P ₄ −A _(N-1) ⊕B _(N) ⊕D ₄

P _(N-1) =A ₁ ⊕B ₂ ⊕D ₃

Because both the files A and B are readily available to the virtual filesystem, the core engine only need to retrieve a corresponding parityblock from the parity file P in order to recover a block missing fromthe file D. For example, assuming that the content block D₁ is missing,the core engine can rebuild the missing content block by performing thefollowing operation:

D ₁ =P ₃ ⊕A ₄ ⊕B ₄

Note that the two parity computation schemes are for illustrativepurpose. Those skilled in the art can develop other schemes based on theteachings herein.

FIGS. 6A to 6D are block diagrams illustrating multiple stages of anexemplary commit tree according to some embodiments. As described above,the commit tree is used for tracking the changes to the virtual filesystem by adding new tree branches at different times. The four blockdiagrams in FIGS. 6A to 6D are similar to four snapshots of the virtualfile system at the four different moments. Note that the commit tree issimplified for illustrating the basic structures of and operationsapplied to a commit One skilled in the art would understand that thecommit tree corresponding to an actual virtual system would be much morecomplex.

As shown in FIG. 6A, the commit tree at the moment of T₁ has one treebranch lead by the head node 601. In some embodiments, the commit treealways has a head node that points to the most recent commit node (whichis commit node C1 in FIG. 6A). The commit node C1 references onedirectory node D1 and the direct node D1 references two file nodes F1and F2, each file node having an associated block catalog (603 or 605).In other words, the virtual file system (or a particular data volume)identified by the commit tree has two files.

FIG. 6B depicts the commit tree at the moment of T₂, which adds one treebranch to the commit tree. Note that the head node 607 references thecommit node C₂, which is the most recent commit node. Both the commitnode C2 and the directory node D2 have references (604, 606) to theirrespective predecessor commit node C1 and directory node D1. Thedirectory node D2 has a reference 611 to the file node F1 and areference 612 to the file node F3. In other words, from T₁ to T₂, thefile associated with the file node F1 remains unchanged. But the fileassociated with the file node F2 has been changed. At the moment of T₁,the file associated with the file node F2 has five content blocks E1 toE5. At the moment of T₂, the file associated with the file node F2 stillhas five content blocks with the content block E3 replaced by thecontent block E6. Because the two versions of the file share the fourcontent blocks E1, E2, E4, and E5, the second version associated withthe file node F3 only has one new block ID corresponding to the newcontent block E6. In other words, the commit tree automatically performsdifferential (or delta) compression by pointing to those old blocks thathave not been changes. Note that the differential compression schemeworks regardless of whether the changes to the content of a file occurat the beginning of the file (e.g., within E1) or at the end of the file(e.g., within E5). Whenever the changed content block exceed thepredefined block size, one or more new content blocks will be generatedto deal with the content overflow.

FIG. 6C depicts the commit tree at the moment of T₃, which adds one moretree branch to the commit tree. Note that the head node 615 nowreferences the most recent commit node C3. The directory node D3 has areference 617 to the file node F2 and a reference 618 to the file nodeF4. In other words, between T₂ and T₃, the virtual file system hasreverted back to the earlier version of the file associated with thefile node F2. For example, a user of the virtual file system may decideto delete the later version of the file associated with the file nodeF3. In addition, a new file associated with the file node F4 is added tothe virtual file system and it has a block catalog 619.

FIG. 6D depicts the commit tree at the moment of T₄ when another treebranch lead by the head node 612 is added to the commit tree. Note thatthe directory node D4 has only one file reference 625 to the file nodeF4, suggesting that the virtual file system has deleted other filesreferenced by the directory node D3 but kept only the file associatedwith the file node F4.

In some embodiments, the virtual file system is required to enable auser to access a set of files residing in the storage cloud (especiallythose remote storage devices) from different geographical locations.Note that the user may be the same person who accesses the set of filesusing different computing devices or different persons who access theset of files using different computing devices. To meet thisrequirement, the virtual file system needs to have a mechanism toresolve potential conflicts between the updates to the set of files fromdifferent computing devices.

FIG. 6E depicts how the virtual file system resolves potential conflictsbetween different revisions to a virtual file system according to someembodiments. For simplicity, the diagram in FIG. 6E focuses on thecommit nodes corresponding to different revisions to the virtual filesystem. As noted above, the first revision is made at the moment of T₁and represented by the commit node C1; and the second revision is madeat the moment of T₂ and represented by the commit node C2. Note thatthere is no conflict for the first two revisions because both revisionsto the virtual file systems are synchronized with the storage cloud suchthat a file stored in the storage cloud (in the form of blocks) are thesame as the file stored at the virtual file system's local file systemcache. Then, at the moments of T₃ and T_(3′), two users independentlymake changes to the same revision of the virtual file system from theirrespective computing devices by, e.g., adding new files to the virtualfile system, deleting existing files from the virtual file system, ormodifying existing files in the virtual file system. As a result, twodifferent commit nodes C3 and C3′ are generated at the two computingdevices from which the changes to the virtual file system were made.This situation may occur when the two users are the virtual file systemin an offline mode.

Subsequently, when one of the two users (e.g., the user who generatesthe commit node C3) first synchronizes its virtual file system with thestorage cloud, the new commit node C3 will be sent to the storage cloudto replace the commit node C2 as the virtual file system's currentcommit node. But when another user (e.g., the user who generates thecommit node C3′) subsequently synchronizes its virtual file system withthe storage cloud, a potential conflict may occur because the commitnode C3′ still references the commit node C2 within the virtual filesystem whereas the metadata returned from the storage cloud (e.g., atstep 307 of FIG. 3A) may indicate that there is another commit node C3referencing the commit node C2 (note that the other user has no priorknowledge of the commit node C3). In this case, the virtual file systemmay initiate a conflict resolution process to merge the revision of thevirtual file system corresponding to the commit node C3 with therevision of the virtual file system corresponding to the commit nodeC3′.

In some embodiments, the virtual file system's configuration policy mayspecify a set of rules on how to resolve conflicts between two differentrevisions. One exemplary rule is that, for certain types of files, thevirtual file system may automatically choose one of the two revisionsover the other one. For example, the chosen one may be the one that hasa more recent time stamp. Another exemplary rule is that a file onlyintroduced into one of the two revisions should be retained in themerged revision (as represented by the commit node C4). Yet anotherexemplary rule is that a file only removed from one of the two revisionsshould be kept in the merged revision as long as such operation does notcause any other information loss to the virtual file system.

In some embodiments, both revisions may have a modified version of thesame file of different contents. In this case, the virtual file systemmay identify the file and raise an alert to the user by, e.g.,generating a pop-up window or adding a special mark to the file in thevirtual file system to indicate that two different contents may existfor the same file. When a user clicks a link/button in the pop-up windowor the file, the virtual file system will invoke an application to openboth versions of the same file in two separate windows so that the usercan manually merge the two contents into one content. In someembodiments, the application is configured to highlight the contentdifferences between the two versions to help the user to choose one overthe other one or neither of the two by providing a new one. At the endof the conflict resolution process, a new commit node C4 is generated asthe current commit node of the virtual file system. Note that the rulesdescribed above are applicable not only to a file but also to files ordirectories or even the entire virtual file system.

FIG. 7 is a block diagram illustrating a client or server deviceequipped with a virtual file system used for operations described aboveaccording to some embodiments. A client or server computer 700 that runsthe virtual file system typically includes one or more processing units(CPU's) 702 for executing modules, programs and/or instructions storedin memory 714 and thereby performing processing operations; one or morenetwork or other communications interfaces 704; memory 714; and one ormore communication buses 712 for interconnecting these components. Insome embodiments, a client or server computer 700 includes a userinterface 706 comprising a display device 708 and one or more inputdevices 710. In some embodiments, memory 714 includes high-speed randomaccess memory, such as DRAM, SRAM, DDR RAM or other random access solidstate memory devices. In some embodiments, memory 714 includesnon-volatile memory, such as one or more magnetic disk storage devices,optical disk storage devices, flash memory devices, or othernon-volatile solid state storage devices. In some embodiments, memory714 includes one or more storage devices remotely located from theCPU(s) 702. Memory 714, or alternately the non-volatile memory device(s)within memory 714, comprises a computer readable storage medium. In someembodiments, memory 714 or the computer readable storage medium ofmemory 714 stores the following programs, modules and data structures,or a subset thereof:

-   -   an operating system 716 that includes procedures for handling        various basic system services and for performing hardware        dependent tasks;    -   a communications module 718 that is used for connecting the        client or server computer 700 to other computers via the one or        more communication network interfaces 704 (wired or wireless)        and one or more communication networks 712, such as the        Internet, other wide area networks, local area networks,        metropolitan area networks, and so on;    -   one or more applications 720, such as an application 720-1 for        playing video streams and an application 720-1 for document        editing;    -   a local file system 730 that manages one or more files 730-1,        730-2;    -   a virtual file system supported by a core engine 740 that        includes a management module 742, a configuration module 744, a        cache module 746 that further includes a local file system cache        748, a block store 750, an object store 752, and a volume data        repository 754;    -   a block retrieval module 756 for retrieving blocks from        different storage devices;    -   a block replication module 758 for implementing the virtual file        system's data redundancy policy (e.g., generating parity data);    -   a block recovery module 760 for recovering a missing block        using, e.g., parity data;    -   a block location management module 762 for tracking the location        of each block and the corresponding adapter for accessing the        block; and    -   a storage devices management module 764 for managing the        retrieval and synchronization operations between the virtual        file system and the storage devices.

Note that a more detailed description of the above identified elementshas been provided above in connection with FIG. 1D. Each of the aboveidentified elements may be stored in one or more of the previouslymentioned memory devices, and corresponds to a set of instructions forperforming a function described above. The above identified modules orprograms (i.e., sets of instructions) need not be implemented asseparate software programs, procedures or modules, and thus varioussubsets of these modules may be combined or otherwise re-arranged invarious embodiments. In some embodiments, memory 714 may store a subsetof the modules and data structures identified above. Furthermore, memory714 may store additional modules or data structures not described above.

Although FIG. 7 shows an instance server used for performing variousoperations in connection with the operation of the virtual file systemas illustrated above, FIG. 7 is intended more as functional descriptionof the various features which may be present in a set of one or morecomputers rather than as a structural schematic of the embodimentsdescribed herein. In practice, and as recognized by those of ordinaryskill in the art, items shown separately could be combined and someitems could be separated. For example, some items shown separately inFIG. 7 could be implemented on individual computer systems and singleitems could be implemented by one or more computer systems. The actualnumber of computers used to implement each of the operations, or thestorage devices, and how features are allocated among them will varyfrom one implementation to another, and may depend in part on the amountof data at each instance of virtual file system, the amount of datatraffic that a virtual file system must handle during peak usageperiods, as well as the amount of data traffic that a virtual filesystem must handle during average usage periods.

FIGS. 8A to 8F are exemplary screenshots of a virtual file systemaccording to some embodiments. In particular, FIG. 8A depicts a windowcorresponding to the virtual file system 800 running at a computingdevice. The virtual file system 800 includes multiple data volumes, eachdata volume having a particular icon in the window. A user can accessfiles associated a particular data volume by clicking on thecorresponding icon in the window. For example, FIG. 8B depicts a windowcorresponding to the data volume 820 after a user selection of the icon807 in FIG. 8A. For illustration, the data volume 820 includes adirectory 821 with a title “2009” and a type “File Folder” 825. Thestatus 823 of the directory is “computer,” which means that thedirectory 821's structure has been rebuilt at the computing device'slocal file system. In addition, the data volume 820 includes multipleDOC files, one of which is a file 827 having a title “Meeting Agenda.”The status 829 of the file is “cloud,” which means that the file 827'scontent is located at one or more storage devices within the storagecloud and has not been rebuilt at the computing device's local filesystem. The lower portion of the window in FIG. 8B depicts the storagedevices (or services) that support the data volume 820. In this example,the data volume 820 has two storage devices, a local storage 831 havinga description 833 (which is a directory at the local file system) and aremote cloud storage service offered by Amazon.

FIG. 8C depicts that the virtual file system includes two newly addedfiles 841, 843. Note that the status of the two files are “added” andeach file icon has a “+” sign, indicating that the two files have notyet been pushed into any of the two storage devices 842, 844. FIG. 8Ddepicts the virtual file system after both files 845, 847 have beenpushed into the storage cloud. As a result, the status of the two fileschange from “added” to “computer,” indicating that the two files areavailable in the local file system cache as well as in the storage cloudand the two sides are in sync. FIG. 8E depicts the virtual file systemafter the file 853 is deleted from the local file system cache. Notethat the status of the. file 851 remains to be “computer” whereas thestatus of the file 853 changes from “computer” to “cloud,” indicatingthat the file 853 is only available in the storage cloud. FIG. 8Fdepicts the version history of an exemplary file. In this example, thefile has two versions with different timestamps. Note that the currentversion 855 is the same as the oldest version 859, indicating that thevirtual file system enables a user to bring back any old version of afile (or even a directory) as the current version for furtherprocessing.

The foregoing description, for purpose of explanation, has beendescribed with reference to specific embodiments. However, theillustrative discussions above are not intended to be exhaustive or tolimit the invention to the precise forms disclosed. Many modificationsand variations are possible in view of the above teachings. Theembodiments were chosen and described in order to best explain theprinciples of the invention and its practical applications, to therebyenable others skilled in the art to best utilize the invention andvarious embodiments with various modifications as are suited to theparticular use contemplated.

1. A computer-implemented method, comprising: at a computing device: receiving a request to convert a revision of a virtual file system at the computing device into a plurality of blocks within at least one storage device; identifying a hierarchical tree structure associated with the revision of the virtual file system, wherein the hierarchical tree structure includes a plurality of tree nodes, each tree node having associated metadata and corresponding to a respective component of the revision of the virtual file system; converting the hierarchical tree structure and associated metadata into the plurality of blocks within the one storage device in a top-down manner, further including: for each tree node: converting the tree node's associated metadata into an object in a serial manner; generating a unique object ID from the object based at least in part on the metadata; storing the object in a respective one of the plurality of blocks within the storage device if the object ID has no matching entry in an object-storage mapping table associated with the virtual file system; and generating a new entry including the object ID in the object-storage mapping table; and storing the object-storage mapping table in a respective one of the plurality of blocks within the storage device.
 2. The computer-implemented method of claim 1, wherein at least one of the tree nodes includes a directory node that corresponds to a directory of the virtual file system.
 3. The computer-implemented method of claim 2, wherein the metadata associated with the directory node includes a directory node identifier, a reference to another directory node, and one or more directory attributes.
 4. The computer-implemented method of claim 1, wherein the object is comprised of an object header and an object body, the object header including the object ID and the object body including the serialized metadata.
 5. The computer-implemented method of claim 4, wherein the object header further includes a tag for determining how the object was constructed in the serial manner and identifying a corresponding application program interface module for deserializing the object in connection with rebuilding the virtual file system.
 6. The computer-implemented method of claim 1, wherein the object ID is generated by applying a hash algorithm to data in the serialized object.
 7. The computer-implemented method of claim 1, wherein at least one of the tree nodes includes a file node that corresponds to a file of the virtual file system and the file includes a plurality of content blocks, the method further comprising: for each content block: generating a unique block ID from the content block; determining whether the block ID has a matching entry in a content block-storage mapping table that includes identifiers of content blocks associated with the virtual file system; if no matching entry is found in the content block-storage mapping table: storing the content block in a respective one of the plurality of blocks within the storage device; and generating a new entry including the block ID in the content block-storage mapping table; if at least one matching entry is found in the content block-storage mapping table: having the file reference a content block that has the same block ID found in the content block-storage mapping table; and storing the content block-storage mapping table in a respective one of the plurality of blocks within the storage device.
 8. The computer-implemented method of claim 7, wherein the referenced content block is associated with an earlier revision of the same file.
 9. The computer-implemented method of claim 7, wherein the referenced content block is associated with a different file of the virtual file system.
 10. The computer-implemented method of claim 1, wherein the computing device is one selected from the group consisting of a desktop computer, a laptop computer, a tablet computer, and a mobile telephone.
 11. The computer-implemented method of claim 1, wherein the computing device has a local file system and the local file system includes directories and files associated with the virtual file system.
 12. A computing device in association with a distributed storage system that includes a plurality of storage devices, comprising: one or more processors; memory; and one or more programs stored in the memory for execution by the one or more processors, the one or more programs comprising instructions for: receiving a request to convert a revision of a virtual file system at the computing device into a plurality of blocks within at least one storage device; identifying a hierarchical tree structure associated with the revision of the virtual file system, wherein the hierarchical tree structure includes a plurality of tree nodes, each tree node having associated metadata and corresponding to a respective component of the revision of the virtual file system; converting the hierarchical tree structure and associated metadata into the plurality of blocks within the one storage device in a top-down manner, further including: for each tree node: converting the tree node's associated metadata into an object in a serial manner; generating a unique object ID from the object based at least in part on the metadata; storing the object in a respective one of the plurality of blocks within the storage device if the object ID has no matching entry in an object-storage mapping table associated with the virtual file system; and generating a new entry including the object ID in the object-storage mapping table; and storing the object-storage mapping table in a respective one of the plurality of blocks within the storage device.
 13. The computing device of claim 12, wherein at least one of the tree nodes includes a directory node that corresponds to a directory of the virtual file system.
 14. The computing device of claim 12, wherein the object is comprised of an object header and an object body, the object header including the object ID and the object body including the serialized metadata.
 15. The computing device of claim 12, wherein the object ID is generated by applying a hash algorithm to data in the serialized object.
 16. The computing device of claim 12, wherein the computing device is one selected from the group consisting of a desktop computer, a laptop computer, a tablet computer, and a mobile telephone.
 17. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computing device having one or more processors and memory storing one or more programs for execution by the one or more processors in association with a distributed storage system that includes a plurality of storage devices, the one or more programs comprising instructions to: receive a request to convert a revision of a virtual file system at the computing device into a plurality of blocks within at least one storage device; identify a hierarchical tree structure associated with the revision of the virtual file system, wherein the hierarchical tree structure includes a plurality of tree nodes, each tree node having associated metadata and corresponding to a respective component of the revision of the virtual file system; convert the hierarchical tree structure and associated metadata into the plurality of blocks within the one storage device in a top-down manner, further including: for each tree node: convert the tree node's associated metadata into an object in a serial manner; generate a unique object ID from the object based at least in part on the metadata; store the object in a respective one of the plurality of blocks within the storage device if the object ID has no matching entry in an object-storage mapping table associated with the virtual file system; and generate a new entry including the object ID in the object-storage mapping table; and store the object-storage mapping table in a respective one of the plurality of blocks within the storage device.
 18. The non-transitory computer readable storage medium of claim 17, wherein at least one of the tree nodes includes a directory node that corresponds to a directory of the virtual file system.
 19. The non-transitory computer readable storage medium of claim 17, wherein the object is comprised of an object header and an object body, the object header including the object ID and the object body including the serialized metadata.
 20. The non-transitory computer readable storage medium of claim 17, wherein the object ID is generated by applying a hash algorithm to data in the serialized object. 